I have separated out the code which shapes the variables in the player data and creates the data files needed to run this report. Before running this file, run the file reportcode.Rmd to create all of the data files necessary to knit this report.
#Reading in data files
stry_scale <- read.csv(file = "data/stry_scale.csv")
longral_pos <- read.csv(file = "data/longral_pos.csv")
fed16_longest <- read.csv(file = "data/fed16_longest.csv")
fed16_only <- read.csv(file = "data/fed16_only.csv")
fed16_scale <- read.csv(file = "data/fed16_scale.csv")
fed16_df <- read.csv(file = "data/fed16_df.csv")
fed17_only <- read.csv(file = "data/fed17_only.csv")
fed18_only <- read.csv(file = "data/fed18_only.csv")
fed_scale <- read.csv(file = "data/fed_scale.csv")
stry16_only <- read.csv(file = "data/stry16_only.csv")
stry17_only<- read.csv(file = "data/stry17_only.csv")
stry_scale <- read.csv(file = "data/stry_scale.csv")
fed16_only_plots <- read.csv(file = "data/fed16_only_plots.csv")
dlb_fedonly <- read.csv(file = "data/dlb_fedonly.csv")
iof_fedonly <- read.csv(file = "data/iof_fedonly.csv")
dlw_fedonly <- read.csv(file = "data/dlw_fedonly.csv")
longral_df <- read.csv(file = "data/longral_df.csv")
fed16_pos <- read.csv(file = "data/fed16_pos.csv")
Visual analysis has been used to select a number of new variables from the rallies data set. For the two hidden state model ‘winner’ is the response variable. For the three hidden state model ‘winner.return.error’ is the response variable.
The covariates chosen after visual analysis are “p.start.position.x”, “p.start.position.y”, “oppo.start.position.x”, “oppo.start.position.y”, “p.movement.angle.1”, “p.diff.avg.shot.and.match.movement.speed”, “oppo.diff.avg.shot.and.match.movement.speed”, “diff.p.avg.and.current.shot.speed”, “lag.oppo.height.off.net”
These covariates have been scaled to between 0 and 1 for the analysis.
To identify the best covariates for predicting hidden states of many different players; training models have been created using a step-wise approach to identify the best combination of covariates in a chosen match data set. These training models have then been tested on a different data sets for a comparison of results. The limitation of this step wise approach is that some potentially important covariates have been ommited from the model because of correlation with other covariates. For example shot type, forehand/backhand etc. is correlated with the players x and y coordinates.
For step-wise analysis, I began by running all the covariates to find the covariate which changes the log Likelihood of the model the most. I then added this covariate to the model and ran all the remaining covariates to find the next best covariate. I repeated this process until the model is no longer being improved through the addition of new covariates or until there are no more covariates.
The first training model(Training Model 1) has been made using step-wise for two hidden states for the match data Federer vs Berdych 2016. The best combination of covariates is: p.start.position.x, lag.oppo.height.off.net, oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, diff.p.avg.and.current.shot.speed, p.movement.angle.1
The second training model (Training Model 2) has been made using step-wise for two hidden states for the match data Federer vs Berdych 2016/17/18. The best combination of covariates is: p.movement.angle.1, oppo.diff.avg.shot.and.match.movement.speed, p.start.position.y, oppo.start.position.x, oppo.start.position.y, p.diff.avg.shot.and.match.movement.speed, lag.oppo.height.off.net
The third training model (Training Model 3) has been made using step-wise for three hidden states for the match data Strycova vs Garcia 2016/17. The best combination of covariates is: oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, oppo.start.position.y, diff.p.avg.and.current.shot.speed, p.start.position.y, p.diff.avg.shot.and.match.movement.speed, lag.oppo.height.off.net
The best two state model by comparing total change in -logLik is Training Model 2 tested on the match data Federer vs Berdych 2016/17/18.
In the attacking state for this model the probability of hitting a winner is 0.255.
In the returning state the probability of hitting a winner is 0.022.
Interpreting the coefficients of the model suggests that.
An increase in p.movement.angle.1 decreases the probability of hitting a winner. If Federer is running directly away from the net he is less likely to hit a winner.
An increase in oppo.diff.avg.shot.and.match.movement.speed increases the probability of hitting a winner. If the Berdych is running faster during the shot, Federer is more likely to hit a winner.
An increases in p.start.position.y decreases the probability of hitting a winner. As Federer moves further away from the centre of the court he is less likely to hit a winner.
An increases oppo.start.position.x decreases the probability of hitting a winner. Federer is more likely to hit a winner when Berdych starts closer to the net.
An increase in oppo.start.position.y decreases the probability of hitting a winner. As Berdych starts further away from the centre of the court Federer is less likely to hit a winner. This is the opposite of what we were expecting.
An Increase in p.diff.avg.shot.and.match.movement.speed decreases the probability of hitting a winner. If Federer is running faster to hit the shot, he is less likely to hit a winner.
An increase in lag.oppo.height.off.net increases the probability of hitting a winner. If Berdych’s shot passes higher over the net, then Federer is more likely to hit a winner.
The covariate with the greatest impact on Federer being in an attacking state is oppo.start.position.y This indicates that Federer has a high probability of hitting a winner if Berdych starts close to the net.
There were some issues with this model. Repeated running of the depmix model in this report can see the attacking state alternate between being defined as state 1 or state 2. The results remain the same for the probability of hitting a winner, but the corresponding states are flipped. We have to be careful in interpreting the results. The attacking state is the one which has a higher value for Re1.1.
library(depmixS4)
## Loading required package: nnet
## Loading required package: MASS
## Loading required package: Rsolnp
library(ggplot2)
#Data set 2, Training Model 2, 2 states.
ds2.mod2.2s <- depmix(winner ~ 1, transition = ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = fed_scale, nstates = 2, family=multinomial("identity"))
ds2.fm2.2s <- fit(ds2.mod2.2s)
## iteration 0 logLik: -193.3407
## iteration 5 logLik: -191.8366
## iteration 10 logLik: -187.6157
## iteration 15 logLik: -183.4945
## iteration 20 logLik: -181.7833
## iteration 25 logLik: -180.798
## iteration 30 logLik: -179.9897
## iteration 35 logLik: -179.4552
## iteration 40 logLik: -179.2184
## iteration 45 logLik: -179.112
## iteration 50 logLik: -178.9808
## iteration 55 logLik: -178.5362
## iteration 60 logLik: -177.5992
## iteration 65 logLik: -176.8534
## iteration 70 logLik: -176.2929
## iteration 75 logLik: -175.8031
## iteration 80 logLik: -175.392
## iteration 85 logLik: -175.0764
## iteration 90 logLik: -174.8127
## iteration 95 logLik: -174.5302
## iteration 100 logLik: -174.1384
## iteration 105 logLik: -173.5179
## iteration 110 logLik: -172.5598
## iteration 115 logLik: -171.8537
## iteration 120 logLik: -171.3753
## iteration 125 logLik: -170.8761
## iteration 130 logLik: -170.339
## iteration 135 logLik: -169.9635
## iteration 140 logLik: -169.5568
## iteration 145 logLik: -169.3549
## iteration 150 logLik: -169.227
## iteration 155 logLik: -169.135
## iteration 160 logLik: -169.0844
## iteration 165 logLik: -169.0464
## iteration 170 logLik: -169.0023
## iteration 175 logLik: -168.9841
## iteration 180 logLik: -168.9591
## iteration 185 logLik: -168.938
## iteration 190 logLik: -168.9197
## converged at iteration 193 with logLik: -168.9193
summary(ds2.fm2.2s)
## Initial state probabilties model
## pr1 pr2
## 1 0
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed +
## p.start.position.y + oppo.start.position.x + oppo.start.position.y +
## p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients:
## St1 St2
## (Intercept) 0 151.81686
## p.movement.angle.1 0 52.65853
## oppo.diff.avg.shot.and.match.movement.speed 0 -222.42661
## p.start.position.y 0 15.27117
## oppo.start.position.x 0 -232.38970
## oppo.start.position.y 0 111.40105
## p.diff.avg.shot.and.match.movement.speed 0 325.01427
## lag.oppo.height.off.net 0 356.70799
## Probalities at zero values of the covariates.
## 1.166201e-66 1
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed +
## p.start.position.y + oppo.start.position.x + oppo.start.position.y +
## p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients:
## St1 St2
## (Intercept) 0 -52.064193
## p.movement.angle.1 0 6.128489
## oppo.diff.avg.shot.and.match.movement.speed 0 -3.720033
## p.start.position.y 0 5.008703
## oppo.start.position.x 0 58.532686
## oppo.start.position.y 0 3.594083
## p.diff.avg.shot.and.match.movement.speed 0 5.419555
## lag.oppo.height.off.net 0 -15.735508
## Probalities at zero values of the covariates.
## 1 2.447982e-23
##
##
## Response parameters
## Resp 1 : multinomial
## Re1.0 Re1.1
## St1 0.745 0.255
## St2 0.978 0.022
In attempting to create a three state model:
Firstly I ran Training Model 1 and Training Model 2 as three state models with ‘winner.return.error’ as the response variable. This was unsuccessful in identifying three states, at the zero values of the covariates only two states can be clearly identified.
I then ran the three state models with the p.advantage as the second response variable. The reasoning for this is to observe the intercept of player advantage in different states and to identify players who try to hit a winner when at a disadvantage. This may provide more insight into player shot intentions.
I created a new three state test model with these two response variables for the data set Strycova vs Garcia 2016/17. For the model summaries, at the zero values of the covariates, three clear states can be seen. However with repeated running of these models, the results are not repeatable. The models can be viewed in the appendix.
The three state model may be useful for identifying the probability of the player hitting a winner or an error from an attacking, returning or a defensive state. A larger probability of hitting a winner from a defensive state compared to other players may be an indicator of player risk taking.
We can see from the plots below that players are hitting winners when at a disadvantage. For the purpose of interpreting the intercept of player advantage as a response variable, I left the data unscaled when running the model.
#Plotting proportion of winners by player advantage.
ggplot(fed16_only, aes(x=p.advantage, y=winner)) + geom_smooth() + xlab("Player Advantage") + ylab("Proportion of Winners") + ggtitle("Federer 2016 Winners by by Player Advantage")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed17_only, aes(x=p.advantage, y=winner)) + geom_smooth() + xlab("Player Advantage") + ylab("Proportion of Winners") + ggtitle("Federer 2017 Winners by by Player Advantage")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed18_only, aes(x=p.advantage, y=winner)) + geom_smooth() + xlab("Player Advantage") + ylab("Proportion of Winners") + ggtitle("Federer 2018 Winners by by Player Advantage")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(stry16_only, aes(x=p.advantage, y=winner)) + geom_smooth() + xlab("Player Advantage") + ylab("Proportion of Winners") + ggtitle("Strycova 2016 Winners by Player Advantage")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Using the package GGAnimate and a top-down court outline taken from “https://github.com/mvparrot/vis-serve/blob/master/report file helper basic plots”. The variables p.start.position.x, p.end.position.x, oppo.start.position.x, oppo.end.position.x, p.start.position.y, p.end.position.y, oppo.start.position.y, oppo.end.position.y, start.x, start.y, projected.ballmark.x and projected.ballmark.y
transition_reveal has been used for filling in gaps in the player and ball position data. To run transition_reveal, a single column was created for the player’s start & end position x coordinates. They are grouped by rally number and shot number and alternate between start and end position. The same thing is done for the player’s position y coordinates. As well as for the opponent’s start and end position x and y coordinates. For animating the ball position, similar columns are created alternating between the start position of the ball and the projected ballmark for both x and y coordinates.
We can take the implied states from the Depmix Model probs and overlay them into the animation to visualise when the player is in an attacking or returning state. In this case we are looking at the longest rally between Federer and Berdych in 2016. Federer’s icon will get larger to very clearly indicate when he is an attacking state. State 2 indicates the player being in an attacking state and State 1 a returning state.
For this example I have taken the state probabilities from the Fitted Depmix model for Federer vs Berdych 2016 and created a vector of implied states for the longest rally. I have defined Federer’s implied states when Berdych is hitting the ball as state 1. As we are running the animation using columns that contain start and end positions, the player state for each shot is duplicated for start and end position.
library(gganimate)
library(tweenr)
## Warning: package 'tweenr' was built under R version 3.5.2
library(transformr)
## Warning: package 'transformr' was built under R version 3.5.2
#Top Down Court View
#
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:MASS':
##
## select
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
#--- Packages Required
# require(ggplot2)
# require(plotly)
#--- Outline of the court
court_trace <- data.frame(x = c(-11.89, -11.89, 0, 0, 0, 11.89, 11.89, -11.89, -11.89, 11.89, 11.89, -11.89, -6.4, -6.4, 6.4, 6.4, 6.4, -6.4),
y = c(5.49, -5.49, -5.49, 5.49, -5.49, -5.49, 5.49, 5.49, 4.115, 4.115, -4.115, -4.115, -4.115, 4.115, 4.115, -4.115, 0, 0),
z = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
net_trace <- data.frame(x = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
y = c(-5.49,-5.49, -6.4, -6.4, -5.49, 0, 5.49, 6.4, 6.4, 5.49, 5.49),
z = c(1.07, 0, 0, 1.07, 1.07, 0.914, 1.07, 1.07, 0, 0, 1.07))
service_trace <- data.frame(x = c(-8, 0, 0, 0, -6.4, -6.4, 0, -6.4, -6.4, -6.4, -6.4, -6.4, 0, 0, -8),
y = c(-5.49, -5.49, -4.115, 4.115, 4.115, 0, 0, 0, -4.115, -5.49, 5.49, -4.115, -4.115, 5.49, 5.49),
z = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ,0 ,0))
axis_labels <- data.frame(x.break = c(-21.89:-11.89, -6.4, 0, 6.4, 11.89),
x.label = c("-10m","","","","","-5m","","","","",
"Baseline","Service Line","Net","Service Line","Baseline"),
y.break = c(-5.49,-4.115,0,4.115,5.49),
y.label = c("Doubles", "Singles","Centre","Singles","Doubles"),
z.break = c(0,0.992,2,3,4),
z.label = c("Ground", "Net", "2m", "3m", "4m"))
#--- Top down court view
court_topdown <- ggplot() +
labs(x = "x direction", y = "y direction") +
scale_x_continuous(breaks = axis_labels$x.break,
labels = axis_labels$x.label) +
scale_y_continuous(breaks = axis_labels$y.break,
labels = axis_labels$y.label) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed()
p.state <- c(1,1,1,1,1,1,1,1,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,1,1,2,2)
p.state <- as.data.frame(p.state)
fed16_longest <- cbind(p.state, fed16_longest)
p.rally.anim10 <- ggplot() +
scale_x_continuous(breaks = axis_labels$x.break) +
scale_y_continuous(breaks = axis_labels$y.break) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point(data= fed16_longest,
aes(x=p.start.position.x, y=p.start.position.y, group = rally.number, colour="red", size = p.state)) +
geom_path(data= fed16_longest,
aes(x=p.start.position.x, y=p.start.position.y, group = rally.number, alpha = shot)) +
geom_point(data= fed16_longest, aes(x=oppo.start.position.x, y=oppo.start.position.y,
group = rally.number, colour="blue")) +
geom_path(data= fed16_longest,aes(x=oppo.start.position.x, y=oppo.start.position.y,
group = rally.number, alpha = shot))+
geom_point(data= fed16_longest,
aes(x=start.x, y=start.y, group = rally.number, colour="green")) +
transition_reveal(pos.rally.count) +
xlab("Player Position X ") +
ylab("Player Position Y") +
ggtitle("Federer 2016 Longest Rally Animation")+
scale_colour_manual(name = "", values=c("red","green","blue"), labels= c("Berdych","Ball","Federer"))
#had to reverse geom_point colour labels for some reason
animate(p.rally.anim10, duration = 30, fps = 10)
## Warning in max(frame): no non-missing arguments to max; returning -Inf
## Warning in max(frame): no non-missing arguments to max; returning -Inf
Confirming choices of covariates through visualising covariates relationship to the response variable.
p.start.position.x- When compared to similar start position x coordinates, Federer appears more likely to hit a winner as he starts the shot closer to the net. Choosing this variable for HMM as we might expect that the probability of Federer being in an attacking state increases as he moves closer to the net( x value increases).
p.start.position.y- Choosing this variable for the HMM as it makes the players start position x coordinates more meaningful when included in the model. We might expect that the probability of Federer being in an attacking state increases as he moves closer to the centre(y value decreases) and to the net. Using absolute value to measure distance from centre.
ggplot(fed16_only_plots, aes(x=p.start.position.x, y=p.start.position.y )) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point(aes(alpha = winner)) +
xlab("Federer Start Position X ") + ylab("Federer Start Position Y") +
ggtitle("Federer winners by start position")
ggplot(fed16_only, aes(x=p.start.position.x,winner)) + geom_smooth() + xlab("X Coordinate of Player") + ylab("Proportion of Winners") + ggtitle("Federer Winners By X Co-Ordinate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed16_only, aes(x=p.start.position.y,winner)) + geom_smooth() + xlab("Distance from Centre of Player") + ylab("Proportion of Winners") + ggtitle("Federer Winners By Distance from Centre")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
p.start.shot.x- Similar to Federer’s start position x coordinates, the x coordinates of his shot hit point appears more likely to result in a winner compared to similarly distanced shots as he moves closer to the net.
p.start.shot.y- Including in HMM to make shot x coordinates more meaningful. Using absolute to measures distance from centre.
ggplot(fed16_only_plots, aes(x=p.start.shot.x, y=p.start.shot.y )) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point(aes(alpha = winner)) +
xlab("Shot Position X ") + ylab("Shot Position Y") +
ggtitle("Federer winners by shot location")
oppo.start.position.x- Federer appears more likely to hit a winner as Berdych start position x coordinates are closer to the net. (As we are analysing the rallies with all of Berdych’s having positive coordinates in the HMM we would expect a decrease in Berdych’s start position x coordinates to increase The probability of Federer being in an attacking state. oppo.start.position.y
ggplot(fed16_only_plots, aes(x=oppo.start.position.x, y=oppo.start.position.y )) +
scale_x_continuous(breaks = axis_labels$x.break) +
scale_y_continuous(breaks = axis_labels$y.break) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point(aes(alpha = winner)) +
xlab("Opponent Start Position X ") + ylab("Opponent Start Position Y") +
ggtitle("Federer winners by Opponent Start Position")
ggplot(fed16_only, aes(x=oppo.start.position.x,winner)) + geom_smooth() + xlab("X Coordinate of Opponent") + ylab("Proportion of Federer Winners") + ggtitle("Federer Winners By Opponent X Co-Ordinate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed16_only, aes(x=oppo.start.position.y,winner)) + geom_smooth() + xlab("Distance from Centre of Opponent") + ylab("Proportion of Winners") + ggtitle("Federer Winners by Opponent Distance from Centre")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
p.diff.avg.shot.and.match.movement.speed - When Federer runs approximately more than 1m/s above or below his average movement speed for match he tends to hit more winners. This variable therefore appears suitable for the HMM. (Using an absolute value of the difference we would expect an increase in difference to increase the probability of Federer being in a attacking state. (add in more info) When Considering winners, errors and returns we move back to relative difference as Federer tends to hit more errors than winners when moving approximately 1m/s slower than his average match movement speed
ggplot(fed16_only_plots, aes(x=factor(winner),y=p.diff.avg.shot.and.match.movement.speed, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by difference in Federer's current and match avg movement speed")
ggplot(fed16_only_plots, aes(x=factor(winner),y=p.diff.avg.shot.and.match.movement.speed, fill=factor(winner))) + geom_violin(scale = "area") + ggtitle("Winner by difference in Federer's current and match avg movement speed")
oppo.diff.avg.shot.and.match.movement.speed - Federer tends to hit more winners as Berdych runs more than 1m/s faster than his average match movement speed. This is a suitable variable for the HMM we would expect an increase in the difference to increase Federer’s probability of being in an attacking state.
ggplot(fed16_only_plots, aes(x=factor(winner),y=oppo.diff.avg.shot.and.match.movement.speed, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by difference in Opponent current/match avg movement speed")
ggplot(fed16_only_plots, aes(x=factor(winner),y=oppo.diff.avg.shot.and.match.movement.speed, fill=factor(winner))) + geom_violin(scale = "area") + ggtitle("Winner by difference in Opponent current/match avg movement speed")
###Number of times the player changes side during the rally p.rally.side.change.count - Not suitable
ggplot(fed16_only_plots, aes(x=p.rally.side.change.count, y=winner)) + geom_bar(stat="identity") + xlab("Rally Side Change Count") + ylab("Proportion of Winners") + ggtitle("Winners By Number of Times Federer Changes Side in Rally")
oppo.rally.side.change.count - Might be suitable when looking at a greater number of rallies.
ggplot(fed16_only_plots, aes(x=oppo.rally.side.change.count, y=winner)) + geom_bar(stat="identity") + xlab("Rally Side Change Count") + ylab("Proportion of Winners") + ggtitle("Winners By Number of Times Berdych Changes Side in Rally")
p.movement.angle.1 - more than 75% of Federer’s Winners come when he is running less than 100 degrees in relation to the baseline. This variable appears suitable for HMM we would expect as Federer’s movement angle increases beyond 100 degrees the probability the he is an attacking state decreases.
ggplot(fed16_only_plots, aes(x=p.movement.angle.1,y=winner)) + geom_smooth() + xlab("Angle Made By Federer's movement") + ylab("Proportion of Winners") + ggtitle("Winners By Federer's start to end movement angle")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed16_only, aes(x=factor(winner),y=p.movement.angle.1, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by Federer's start to end movement angle")
ggplot(fed16_only, aes(x=factor(winner),y=p.movement.angle.1, fill=factor(winner))) + geom_violin(scale = "area") + ggtitle("Winner by Federer's start to end movement angle")
# ggplot(fed16_only, aes(x=factor(winner.return.error),y=p.movement.angle.1, fill=factor(winner.return.error))) + geom_violin(scale = "area") + ggtitle("Winner by Federer's start to end movement angle")
oppo.movement.angle.1 - Need more evidence to include in model
ggplot(fed16_only_plots, aes(x=oppo.movement.angle.1,y=winner)) + geom_smooth() + xlab("Angle Made By Berdych's movement") + ylab("Proportion of Winners") + ggtitle("Winners By Berdych's start to end movement angle during Federer's shot")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed16_only, aes(x=factor(winner),y=oppo.movement.angle.1, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by Berdych's start to end movement angle during Federer's shot")
ggplot(fed16_only, aes(x=factor(winner),y=oppo.movement.angle.1, fill=factor(winner))) + geom_violin(scale = "area") + ggtitle("Winner by Berdych's start to end movement angle during Federer's shot")
lag.oppo.height.off.net - Federer tends to hit more winners when incoming shot approaches 50cm off the height of the net. This decreases as the height of the incoming shot increases or decreases. Including this variable in the HMM. Expect the coefficient to be small.
ggplot(fed16_only_plots, aes(x=lag.oppo.height.off.net,y=winner)) + geom_smooth() + xlab("Height of opponents shot") + ylab("Proportion of Winners") + ggtitle("Winners by Height over Net of Most recent Opponent shot")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
###The difference in players current shot speed from the match average diff.p.avg.and.current.shot.speed
ggplot(fed16_only, aes(x=factor(winner),y=diff.p.avg.and.current.shot.speed, fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by player current/average shot speed difference")
The appendix contains:
-The Depmix models and visual analysis of the state probabilities of these models.
-The step-wise analysis to find the best covariates for the Depmix models.
-Additional visual analysis of variables.
Here is a series of plots showing:
-Shot type by shot x/y coordinates
-A recreation of plots from the winter project on different match data.
#Shot Type by coordinates
ggplot(dlb_fedonly, aes(x=start.x,y=start.y)) +
scale_x_continuous(breaks = axis_labels$x.break) +
scale_y_continuous(breaks = axis_labels$y.break) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point() +
xlab("X Coordinate") +
ylab("Y Coordinate") +
ggtitle("Federer Down the Line Backhands By Shot Co-Ordinates")
ggplot(iof_fedonly, aes(x=start.x,y=start.y)) +
scale_x_continuous(breaks = axis_labels$x.break) +
scale_y_continuous(breaks = axis_labels$y.break) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point() +
xlab("X Coordinate") +
ylab("Y Coordinate") +
ggtitle("Federer Inside out Forehands By Shot Co-Ordinates")
ggplot(dlw_fedonly, aes(x=start.x,y=start.y)) +
scale_x_continuous(breaks = axis_labels$x.break) +
scale_y_continuous(breaks = axis_labels$y.break) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point() +
xlab("X Coordinate") +
ylab("Y Coordinate") +
ggtitle("Federer Down the Line Winners By Shot Co-Ordinates")
# #Movement and shot position seperated by forehand and backhand
# ggplot(longral_df, aes(x=p.start.position.x,y=p.start.position.y,z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + facet_wrap(~factor(hitpoint)) + xlab("Player start Position X ") + ylab("Player start Position Y") + ggtitle("Federer winners by start position seperated by shot type for long rallies")
#
# ggplot(longral_df, aes(x=p.end.position.x,y=p.end.position.y,z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + facet_wrap(~factor(hitpoint)) + xlab("Player end Position X ") + ylab("Player end Position Y") + ggtitle("Federer winners by end position seperated by shot type for long rallies")
#
# ggplot(longral_df, aes(x=p.start.position.x,y=p.start.position.y,z=is.good)) + stat_summary_hex(fun = function(is.good) sum(is.good)) + facet_wrap(~factor(hitpoint)) + xlab("Player start Position X ") + ylab("Player start Position Y") + ggtitle("Federer errors by start position seperated by shot type for long rallies")
#
# ggplot(longral_df, aes(x=p.end.position.x,y=p.end.position.y,z=is.good)) + stat_summary_hex(fun = function(is.good) sum(is.good)) + facet_wrap(~factor(hitpoint)) + xlab("Player end Position X ") + ylab("Player end Position Y") + ggtitle("Federer errors by end position seperated by shot type for long rallies")
#
# ggplot(longral_df, aes(x=p.start.position.x,y=p.start.position.y,z=time.to.net)) + stat_summary_hex(fun = function(time.to.net) sum(time.to.net)) + facet_wrap(~factor(hitpoint)) + xlab("Player start Position X ") + ylab("Player start Position Y") + ggtitle("Federer shot time to net by start position seperated by shot type for long rallies")
#
# ggplot(longral_df, aes(x=p.end.position.x,y=p.end.position.y,z=time.to.net)) + stat_summary_hex(fun = function(time.to.net) sum(time.to.net)) + facet_wrap(~factor(hitpoint)) + xlab("Player end Position X ") + ylab("Player end Position Y") + ggtitle("Federer shot time to net by end position seperated by shot type for long rallies")
#Visulaising speed ratio
ggplot(fed16_df, aes(x=base.dist,y=speed.ratio)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Speed Ratio") + ggtitle("Speed ratio by distance from the baseline For match")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
ggplot(fed16_only, aes(x=base.dist,y=speed.ratio)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Speed Ratio") + ggtitle("Fed Only Speed ratio by distance from the baseline For match")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(longral_df, aes(x=base.dist,y=speed.ratio)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Speed Ratio") + ggtitle("Speed ratio by distance from the baseline For Long Rallies")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed16_longest, aes(x=base.dist,y=speed.ratio)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Speed Ratio") + ggtitle("Speed ratio by distance from the baseline For Longest Rally")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#Visualising player shot angles
ggplot(fed16_df, aes(x=base.dist,y=p.angle)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Angle Made By Federers Shot") + ggtitle("Federer Shot Angle by distance from the baseline")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
ggplot(fed16_only, aes(x=base.dist,y=p.angle)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Angle Made By Federers Shot") + ggtitle("Fed only Shot Angle by distance from the baseline")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed16_df, aes(x=base.dist,y=o.angle)) + geom_smooth() + xlab("Distance from the baseline") + ylab("Angle Made By Berdych previous Shot") + ggtitle("Berdych Shot Angle by Federer distance from the baseline")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 219 rows containing non-finite values (stat_smooth).
##Plots of shot and ballmark co-ordinates, angles made by shots and speed
library(ggplot2)
#X Co-ordinate of shot
ggplot(fed16_only, aes(x=start.x,winner)) + geom_smooth() + xlab("X Coordinate of Shot") + ylab("Proportion of Winners") + ggtitle("Winners By X Co-Ordinate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#Opponents shot before
library(hexbin)
ggplot(fed16_only, aes(x=oppo.hit.x,y=oppo.hit.y,z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + xlab("X Coordinate of Oppo Shot") + ylab("Y Coordinate of Oppo Shot") + ggtitle("Count of Winners By Opponent's Shot Co-Ordinates")
### Get help transforming this to proportion in each bin instead of raw count
#Fed Shots on x-y plane
ggplot(fed16_only, aes(x=start.x,y=start.y,z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + xlab("X Coordinate") + ylab("Y Coordinate") + ggtitle("Winners By Shot Co-Ordinates")
#Fed Ballmark by winner in x-y plane
ggplot(fed16_only, aes(x=projected.ballmark.x,y=projected.ballmark.y,z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + xlab("X Coordinate") + ylab("Y Coordinate") + ggtitle("Winners By Shot Ballmark Co-Ordinates")
#Angle of shots
ggplot(fed16_only, aes(x=o.angle,y=winner)) + geom_smooth() + xlab("Angle Made By Opponents Shot With Fed's Previous Shot") + ylab("Proportion of Winners") + ggtitle("Winners By Opponent Shot Angle")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed16_only, aes(x=p.angle,y=winner)) + geom_smooth() + xlab("Angle Made By Federer's Shot") + ylab("Proportion of Winners") + ggtitle("Winners By Federer's Shot Angle")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(fed16_only) + geom_density(aes(p.angle,group=factor(winner),color=factor(winner))) + ggtitle("Density of Winners by Fed Shot Angles")
ggplot(fed16_only) + geom_density(aes(o.angle,group=factor(winner),color=factor(winner))) + ggtitle("Density of Winners by Opponent Shot Angles")
#Winners by oppo speed
ggplot(fed16_only) + geom_density(aes(oppo.speed, group = factor(winner), color=factor(winner)))
ggplot(fed16_only, aes(x=factor(winner),y=oppo.speed,fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by Opponent Speed")
#Winners by speed ratio
ggplot(fed16_only) + geom_density(aes(speed.ratio, group = factor(winner), color=factor(winner)))
ggplot(fed16_only, aes(x=factor(winner),y=speed.ratio,fill=factor(winner))) + geom_boxplot() + ggtitle("Winner by Speed Ratio")
##Depmix Models All of the depmix models run. Some three state models and step-wise analysis have been commented out because of causing issues with the knit. The code can still be run in Rmarkdown by uncommenting. Examined three data sets: Data Set One: Federer vs Berdych 2016 Data Set Two: Federer vs Berdych 2016/17/17 Data Set Three: Strycova vs Garcia 2016/17 ###Federer 2016 Data Set 1 Training Model 1 for Data Set 1 (Federer vs Berdych 2016) Training Model 1 is the best combination of variables found from running step-wise on Data Set 1
# Variables fitted:
#p.start.position.x, lag.oppo.height.off.net, oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, diff.p.avg.and.current.shot.speed, p.movement.angle.1
ds1.mod1.2s <- depmix(winner ~ 1, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + diff.p.avg.and.current.shot.speed + p.movement.angle.1, data = fed16_scale, nstates = 2, family=multinomial("identity"))
ds1.fm1.2s <- fit(ds1.mod1.2s)
## iteration 0 logLik: -77.52065
## iteration 5 logLik: -75.19778
## iteration 10 logLik: -71.93352
## iteration 15 logLik: -70.06745
## iteration 20 logLik: -68.04245
## iteration 25 logLik: -66.61357
## iteration 30 logLik: -65.83822
## iteration 35 logLik: -64.32856
## iteration 40 logLik: -60.65898
## iteration 45 logLik: -59.343
## iteration 50 logLik: -59.13124
## iteration 55 logLik: -59.03431
## iteration 60 logLik: -58.93476
## iteration 65 logLik: -58.86841
## iteration 70 logLik: -58.82553
## iteration 75 logLik: -58.78111
## iteration 80 logLik: -58.65491
## iteration 85 logLik: -58.06383
## iteration 90 logLik: -56.25788
## iteration 95 logLik: -55.54676
## iteration 100 logLik: -55.08047
## iteration 105 logLik: -54.80902
## iteration 110 logLik: -54.53245
## iteration 115 logLik: -54.35062
## iteration 120 logLik: -54.20976
## iteration 125 logLik: -54.14172
## iteration 130 logLik: -54.02673
## converged at iteration 134 with logLik: -54.01444
summary(ds1.fm1.2s)
## Initial state probabilties model
## pr1 pr2
## 1 0
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed +
## oppo.start.position.x + diff.p.avg.and.current.shot.speed +
## p.movement.angle.1
## Coefficients:
## St1 St2
## (Intercept) 0 31.0832797
## p.start.position.x 0 108.7561342
## lag.oppo.height.off.net 0 -99.0752111
## oppo.diff.avg.shot.and.match.movement.speed 0 1.5101516
## oppo.start.position.x 0 -28.7908402
## diff.p.avg.and.current.shot.speed 0 -14.1050364
## p.movement.angle.1 0 0.7189997
## Probalities at zero values of the covariates.
## 3.167402e-14 1
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed +
## oppo.start.position.x + diff.p.avg.and.current.shot.speed +
## p.movement.angle.1
## Coefficients:
## St1 St2
## (Intercept) 0 -154.34203
## p.start.position.x 0 -648.26071
## lag.oppo.height.off.net 0 364.63212
## oppo.diff.avg.shot.and.match.movement.speed 0 -125.03439
## oppo.start.position.x 0 256.86055
## diff.p.avg.and.current.shot.speed 0 70.37259
## p.movement.angle.1 0 63.54975
## Probalities at zero values of the covariates.
## 1 9.3349e-68
##
##
## Response parameters
## Resp 1 : multinomial
## Re1.0 Re1.1
## St1 0.609 0.391
## St2 1.000 0.000
#Pulling state probabilities from fitted model to add to a dataframe to look for correlation with fitted variables
ds1.fm1.2s_df <- posterior(ds1.fm1.2s)
ds1.fm1.2s_df <- cbind(fed16_scale, ds1.fm1.2s_df)
#dropping columns not in the fitted depmix model
ds1.fm1.2s_df <- ds1.fm1.2s_df[ -c(2, 4, 6:8, 12:14) ]
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Plotting states for training model fed16
#p.start.position.x, oppo.start.position.x, oppo.diff.avg.shot.and.match.movement.speed, p.diff.avg.shot.and.match.movement.speed, lag.p.angle, oppo.rally.side.change.count, diff.lag.oppo.avg.and.current.shot.speed, diff.p.avg.and.current.shot.speed, p.movement.angle.1, lag.oppo.height.off.net, oppo.start.position.y
probs = posterior(ds1.fm1.2s)
plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')
matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
legend(x='topright', c('State1','State2'), fill=1:2, bty='n')
ggplot() +
geom_path(data=fed16_scale, aes(x = player.total.shot.number, y = winner)) +
xlab("Federer shot Number") +
ylab("Actual State") +
ggtitle("Actual state")
#plotting a matrix to identify any correlation between depmix probabilities and chosen variables
library(GGally)
## Warning: package 'GGally' was built under R version 3.5.2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
ggduo(ds1.fm1.2s_df, 1:3, 8, showStrips = FALSE)
ggduo(ds1.fm1.2s_df, 4:5, 8, showStrips = FALSE)
#showing correlation between variables and probaility of being in state 1
cor.ds1.fm1.2s <- cor(ds1.fm1.2s_df)
cor.ds1.fm1.2s_df <- cor.ds1.fm1.2s[ -c(1:5, 7) ]
# Variables fitted:
#p.start.position.x, lag.oppo.height.off.net, oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, diff.p.avg.and.current.shot.speed, p.movement.angle.1
ds1.mod1.3s <- depmix(winner.return.error ~ 1, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + diff.p.avg.and.current.shot.speed + p.movement.angle.1, data = fed16_scale, nstates = 3, family=multinomial("identity"))
ds1.fm1.3s <- fit(ds1.mod1.3s)
## iteration 0 logLik: -128.72
## iteration 5 logLik: -125.902
## iteration 10 logLik: -118.5809
## iteration 15 logLik: -107.8256
## iteration 20 logLik: -98.11433
## iteration 25 logLik: -84.66842
## iteration 30 logLik: -77.11373
## iteration 35 logLik: -74.07867
## iteration 40 logLik: -72.1316
## iteration 45 logLik: -71.34158
## iteration 50 logLik: -70.93945
## iteration 55 logLik: -70.75966
## iteration 60 logLik: -70.69247
## iteration 65 logLik: -70.66245
## iteration 70 logLik: -70.64947
## iteration 75 logLik: -70.64291
## iteration 80 logLik: -70.63956
## iteration 85 logLik: -70.63843
## iteration 90 logLik: -70.63677
## iteration 95 logLik: -70.63634
## iteration 100 logLik: -70.63633
## iteration 105 logLik: -70.63632
## iteration 110 logLik: -70.63631
## converged at iteration 114 with logLik: -70.6363
summary(ds1.fm1.3s)
## Initial state probabilties model
## pr1 pr2 pr3
## 0 1 0
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed +
## oppo.start.position.x + diff.p.avg.and.current.shot.speed +
## p.movement.angle.1
## Coefficients:
## St1 St2 St3
## (Intercept) 0 189.3291 -333.198812
## p.start.position.x 0 -273.7255 -242.617301
## lag.oppo.height.off.net 0 -298.4381 228.919551
## oppo.diff.avg.shot.and.match.movement.speed 0 270.0946 288.105835
## oppo.start.position.x 0 -192.8203 191.904321
## diff.p.avg.and.current.shot.speed 0 -146.0170 -5.160971
## p.movement.angle.1 0 177.8477 296.384584
## Probalities at zero values of the covariates.
## 5.962609e-83 1 1.172278e-227
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed +
## oppo.start.position.x + diff.p.avg.and.current.shot.speed +
## p.movement.angle.1
## Coefficients:
## St1 St2 St3
## (Intercept) 0 30.495955 51.354835
## p.start.position.x 0 -147.470327 5.550402
## lag.oppo.height.off.net 0 151.587286 -124.041505
## oppo.diff.avg.shot.and.match.movement.speed 0 -1.920757 -42.669313
## oppo.start.position.x 0 -15.909155 81.835023
## diff.p.avg.and.current.shot.speed 0 -11.764842 -10.982834
## p.movement.angle.1 0 -24.727070 -39.510764
## Probalities at zero values of the covariates.
## 4.975978e-23 8.731796e-10 1
##
## Transition model for state (component) 3
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed +
## oppo.start.position.x + diff.p.avg.and.current.shot.speed +
## p.movement.angle.1
## Coefficients:
## St1 St2 St3
## (Intercept) 0 -164.59080 -116.87490
## p.start.position.x 0 110.77509 54.85293
## lag.oppo.height.off.net 0 -43.54872 -20.88380
## oppo.diff.avg.shot.and.match.movement.speed 0 -30.00681 -46.70615
## oppo.start.position.x 0 342.23172 308.58827
## diff.p.avg.and.current.shot.speed 0 -131.28350 -144.98309
## p.movement.angle.1 0 -62.06163 -42.12094
## Probalities at zero values of the covariates.
## 1 3.304645e-72 1.745328e-51
##
##
## Response parameters
## Resp 1 : multinomial
## Re1.0 Re1.0.5 Re1.1
## St1 0.257 0.727 0.016
## St2 0.000 0.279 0.721
## St3 0.000 1.000 0.000
#plotting Data Set 1 Training Model 1 with 3 states
probs = posterior(ds1.fm1.3s)
plot((probs$state), type='s', main='Implied States', xlab='', ylab='State')
matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
legend(x='topright', c('State1','State2', "State2"), fill=1:3, bty='n')
ggplot() +
geom_path(data=fed16_scale, aes(x = player.total.shot.number, y = winner.return.error)) +
xlab("Federer shot Number") +
ylab("Actual State") +
ggtitle("Actual state")
###Federer Vs Berdych 2016/17/18 Data Set 2 Training Model two is the best combination of variables from running Step-wise on Data Set 2
# Variables fitted:
#p.start.position.x, lag.oppo.height.off.net, oppo.diff.avg.shot.and.match.movement.speed, oppo.start.position.x, diff.p.avg.and.current.shot.speed, p.movement.angle.1
ds2.mod1.2s <- depmix(winner ~ 1, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + diff.p.avg.and.current.shot.speed + p.movement.angle.1, data = fed_scale, nstates = 2, family=multinomial("identity"))
ds2.fm1.2s <- fit(ds2.mod1.2s)
## iteration 0 logLik: -193.5052
## iteration 5 logLik: -193.453
## iteration 10 logLik: -193.0896
## iteration 15 logLik: -191.1532
## iteration 20 logLik: -186.3481
## iteration 25 logLik: -183.0758
## iteration 30 logLik: -182.2625
## iteration 35 logLik: -181.9671
## iteration 40 logLik: -181.6846
## iteration 45 logLik: -181.344
## iteration 50 logLik: -180.9735
## iteration 55 logLik: -180.6409
## iteration 60 logLik: -180.396
## iteration 65 logLik: -180.2425
## iteration 70 logLik: -180.1479
## iteration 75 logLik: -180.0808
## iteration 80 logLik: -180.0245
## iteration 85 logLik: -179.9722
## iteration 90 logLik: -179.9209
## iteration 95 logLik: -179.8688
## iteration 100 logLik: -179.8142
## iteration 105 logLik: -179.7555
## iteration 110 logLik: -179.6907
## iteration 115 logLik: -179.6175
## iteration 120 logLik: -179.5334
## iteration 125 logLik: -179.4366
## iteration 130 logLik: -179.3283
## iteration 135 logLik: -179.2131
## iteration 140 logLik: -179.0959
## iteration 145 logLik: -178.9787
## iteration 150 logLik: -178.8625
## iteration 155 logLik: -178.7491
## iteration 160 logLik: -178.6395
## iteration 165 logLik: -178.5353
## iteration 170 logLik: -178.4373
## iteration 175 logLik: -178.3464
## iteration 180 logLik: -178.263
## iteration 185 logLik: -178.187
## iteration 190 logLik: -178.1185
## iteration 195 logLik: -178.0569
## iteration 200 logLik: -178.0017
## iteration 205 logLik: -177.9526
## iteration 210 logLik: -177.909
## iteration 215 logLik: -177.8704
## iteration 220 logLik: -177.8362
## iteration 225 logLik: -177.8062
## iteration 230 logLik: -177.7797
## iteration 235 logLik: -177.7565
## iteration 240 logLik: -177.7361
## iteration 245 logLik: -177.7183
## iteration 250 logLik: -177.7027
## iteration 255 logLik: -177.6891
## iteration 260 logLik: -177.6773
## iteration 265 logLik: -177.667
## iteration 270 logLik: -177.6581
## iteration 275 logLik: -177.6503
## iteration 280 logLik: -177.6435
## iteration 285 logLik: -177.6376
## iteration 290 logLik: -177.6326
## iteration 295 logLik: -177.6282
## iteration 300 logLik: -177.6243
## iteration 305 logLik: -177.621
## iteration 310 logLik: -177.6182
## iteration 315 logLik: -177.6157
## iteration 320 logLik: -177.6136
## iteration 325 logLik: -177.6118
## iteration 330 logLik: -177.6102
## iteration 335 logLik: -177.6088
## iteration 340 logLik: -177.6077
## iteration 345 logLik: -177.6066
## iteration 350 logLik: -177.6057
## iteration 355 logLik: -177.6049
## iteration 360 logLik: -177.6042
## iteration 365 logLik: -177.6037
## iteration 370 logLik: -177.6032
## iteration 375 logLik: -177.6027
## iteration 380 logLik: -177.6024
## iteration 385 logLik: -177.602
## iteration 390 logLik: -177.6018
## iteration 395 logLik: -177.6015
## iteration 400 logLik: -177.6013
## iteration 405 logLik: -177.6011
## iteration 410 logLik: -177.601
## iteration 415 logLik: -177.6008
## iteration 420 logLik: -177.6007
## iteration 425 logLik: -177.6006
## iteration 430 logLik: -177.6005
## iteration 435 logLik: -177.6005
## iteration 440 logLik: -177.6004
## iteration 445 logLik: -177.6003
## iteration 450 logLik: -177.6003
## iteration 455 logLik: -177.6002
## iteration 460 logLik: -177.6002
## iteration 465 logLik: -177.6002
## iteration 470 logLik: -177.6002
## iteration 475 logLik: -177.6001
## iteration 480 logLik: -177.6001
## iteration 485 logLik: -177.6001
## iteration 490 logLik: -177.6001
## iteration 495 logLik: -177.6001
## iteration 500 logLik: -177.6001
summary(ds2.fm1.2s)
## Initial state probabilties model
## pr1 pr2
## 1 0
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed +
## oppo.start.position.x + diff.p.avg.and.current.shot.speed +
## p.movement.angle.1
## Coefficients:
## St1 St2
## (Intercept) 0 5.3549862
## p.start.position.x 0 1.9884892
## lag.oppo.height.off.net 0 2.2153863
## oppo.diff.avg.shot.and.match.movement.speed 0 -2.7077086
## oppo.start.position.x 0 -1.3917780
## diff.p.avg.and.current.shot.speed 0 -1.9346839
## p.movement.angle.1 0 -0.6188697
## Probalities at zero values of the covariates.
## 0.004702319 0.9952977
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed +
## oppo.start.position.x + diff.p.avg.and.current.shot.speed +
## p.movement.angle.1
## Coefficients:
## St1 St2
## (Intercept) 0 1.27833029
## p.start.position.x 0 -0.10527171
## lag.oppo.height.off.net 0 -0.07047395
## oppo.diff.avg.shot.and.match.movement.speed 0 -1.19928715
## oppo.start.position.x 0 0.57717782
## diff.p.avg.and.current.shot.speed 0 0.61862092
## p.movement.angle.1 0 1.44618426
## Probalities at zero values of the covariates.
## 0.2178346 0.7821654
##
##
## Response parameters
## Resp 1 : multinomial
## Re1.0 Re1.1
## St1 0 1
## St2 1 0
probs = posterior(ds2.fm1.2s)
plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')
matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
legend(x='topright', c('State1','State2'), fill=1:2, bty='n')
#Plotting probability states for test model fed16/17/18
#p.start.position.x, oppo.start.position.x, oppo.diff.avg.shot.and.match.movement.speed, p.diff.avg.shot.and.match.movement.speed, lag.p.angle, oppo.rally.side.change.count, diff.lag.oppo.avg.and.current.shot.speed, diff.p.avg.and.current.shot.speed, p.movement.angle.1, lag.oppo.height.off.net, oppo.start.position.y
#Pulling state probabilities from fitted model to add to a dataframe to look for correlation with fitted variables
ds2.fm1.2s_df <- posterior(ds2.fm1.2s)
ds2.fm1.2s_df <- cbind(fed_scale, ds2.fm1.2s_df)
#dropping columns not in the fitted depmix model
# ds2.fm1.2s_df <- ds2.fm1.2s_df[ -c(2, 6) ]
#plotting a matrix to identify any correlation between depmix probabilities and chosen variables
library(GGally)
# ggduo(ds2.fm1.2s_df, 1:3, 14, showStrips = FALSE)
# ggduo(ds2.fm1.2s_df, 4:6, 14, showStrips = FALSE)
# ggduo(ds2.fm1.2s_df, 7:9, 14, showStrips = FALSE)
# ggduo(ds2.fm1.2s, 10:11, 14, showStrips = FALSE)
#showing correlation between variables and probaility of being in state 1
# cor.ds2.fm1.2s <- cor(ds2.fm1.2s_df)
# cor.ds2.fm1.2s_df <- as.data.frame(apply(cor.ds2.fm1.2s, 2, function(x) ifelse (abs(x) >=0-1,x,"NA")))
# cor.ds2.fm1.2s_df <- cor.ds2.fm1.2s_df[ -c(1:13, 15) ]
#Training Model 2 has the best variables from stepwise run on Data Set 2
#p.movement.angle.1, oppo.diff.avg.shot.and.match.movement.speed, p.start.position.y, oppo.start.position.x, oppo.start.position.y, p.diff.avg.shot.and.match.movement.speed, lag.oppo.height.off.net
#running this model on the scaled fed_only data set
ds2.mod2.2s <- depmix(winner ~ 1, transition = ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = fed_scale, nstates = 2, family=multinomial("identity"))
ds2.fm2.2s <- fit(ds2.mod2.2s)
## iteration 0 logLik: -193.5032
## iteration 5 logLik: -193.4286
## iteration 10 logLik: -192.9374
## iteration 15 logLik: -190.6037
## iteration 20 logLik: -185.7654
## iteration 25 logLik: -182.6925
## iteration 30 logLik: -181.4189
## iteration 35 logLik: -180.5176
## iteration 40 logLik: -179.7856
## iteration 45 logLik: -179.357
## iteration 50 logLik: -179.1796
## iteration 55 logLik: -179.0803
## iteration 60 logLik: -178.8923
## iteration 65 logLik: -178.2403
## iteration 70 logLik: -177.312
## iteration 75 logLik: -176.6445
## iteration 80 logLik: -176.1105
## iteration 85 logLik: -175.6443
## iteration 90 logLik: -175.2695
## iteration 95 logLik: -174.9793
## iteration 100 logLik: -174.7172
## iteration 105 logLik: -174.4061
## iteration 110 logLik: -173.9563
## iteration 115 logLik: -173.1872
## iteration 120 logLik: -172.2315
## iteration 125 logLik: -171.6512
## iteration 130 logLik: -171.1833
## iteration 135 logLik: -170.6749
## iteration 140 logLik: -170.2238
## iteration 145 logLik: -169.744
## iteration 150 logLik: -169.4614
## iteration 155 logLik: -169.2892
## iteration 160 logLik: -169.192
## iteration 165 logLik: -169.1299
## iteration 170 logLik: -169.0759
## iteration 175 logLik: -169.0327
## iteration 180 logLik: -168.9982
## iteration 185 logLik: -168.9677
## iteration 190 logLik: -168.9502
## iteration 195 logLik: -168.9239
## iteration 200 logLik: -168.9185
## converged at iteration 201 with logLik: -168.9185
summary(ds2.fm2.2s)
## Initial state probabilties model
## pr1 pr2
## 1 0
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed +
## p.start.position.y + oppo.start.position.x + oppo.start.position.y +
## p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients:
## St1 St2
## (Intercept) 0 151.96159
## p.movement.angle.1 0 52.74108
## oppo.diff.avg.shot.and.match.movement.speed 0 -222.72829
## p.start.position.y 0 15.31235
## oppo.start.position.x 0 -232.65781
## oppo.start.position.y 0 111.45715
## p.diff.avg.shot.and.match.movement.speed 0 325.63549
## lag.oppo.height.off.net 0 357.17549
## Probalities at zero values of the covariates.
## 1.009064e-66 1
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed +
## p.start.position.y + oppo.start.position.x + oppo.start.position.y +
## p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients:
## St1 St2
## (Intercept) 0 -52.067848
## p.movement.angle.1 0 6.134092
## oppo.diff.avg.shot.and.match.movement.speed 0 -3.716893
## p.start.position.y 0 5.012778
## oppo.start.position.x 0 58.534178
## oppo.start.position.y 0 3.594212
## p.diff.avg.shot.and.match.movement.speed 0 5.419264
## lag.oppo.height.off.net 0 -15.743651
## Probalities at zero values of the covariates.
## 1 2.439051e-23
##
##
## Response parameters
## Resp 1 : multinomial
## Re1.0 Re1.1
## St1 0.745 0.255
## St2 0.978 0.022
probs = posterior(ds2.fm2.2s)
plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')
matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
legend(x='topright', c('State1','State2'), fill=1:2, bty='n')
#Plotting probability states for test model 2 for fed16/17/18
#Pulling state probabilities from fitted model to add to a dataframe to look for correlation with fitted variables
#ds2.fm2.2s_df <- posterior(ds2.fm2.2s)
#ds2.fm2.2s_df <- cbind(fed_scale, ds2.fm2.2s_df)
#dropping columns not in the fitted depmix model
#ds2.fm2.2s_df <- ds2.fm2.2s_df[ -c() ]
#plotting a matrix to identify any correlation between depmix probabilities and chosen variables
# library(GGally)
# ggduo(ds2.fm2.2s_df, 1:3, 13, showStrips = FALSE)
# ggduo(ds2.fm2.2s_df, 4:6, 13, showStrips = FALSE)
#showing correlation between variables and probaility of being in state 1 for test model 2
# cor.ds2.fm2.2s <- cor(ds2.fm2.2s_df)
# cor.ds2.fm2.2s_df <- as.data.frame(apply(cor.ds2.fm2.2s, 2, function(x) ifelse (abs(x) >=-1,x,"NA")))
# cor.ds2.fm2.2s_df <- cor.ds2.fm2.2s[ -c() ]
`` ###Data Set 2 Training Model 2 (3 States)
#running this model on the scaled fed_only data set
ds2.mod2.3s <- depmix(winner.return.error ~ 1, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net , data = fed_scale, nstates = 3, family=multinomial("identity"))
ds2.fm2.3s <- fit(ds2.mod2.3s)
## iteration 0 logLik: -362.5245
## iteration 5 logLik: -362.0193
## iteration 10 logLik: -360.1707
## iteration 15 logLik: -356.5291
## iteration 20 logLik: -353.289
## iteration 25 logLik: -351.0103
## iteration 30 logLik: -348.7662
## iteration 35 logLik: -345.3797
## iteration 40 logLik: -339.9362
## iteration 45 logLik: -333.4647
## iteration 50 logLik: -328.9034
## iteration 55 logLik: -324.5113
## iteration 60 logLik: -319.2431
## iteration 65 logLik: -314.7265
## iteration 70 logLik: -310.1458
## iteration 75 logLik: -305.547
## iteration 80 logLik: -302.403
## iteration 85 logLik: -301.0839
## iteration 90 logLik: -300.4038
## iteration 95 logLik: -299.8475
## iteration 100 logLik: -298.4973
## iteration 105 logLik: -297.4937
## iteration 110 logLik: -296.8352
## iteration 115 logLik: -296.576
## iteration 120 logLik: -296.3748
## iteration 125 logLik: -296.2295
## iteration 130 logLik: -296.1044
## iteration 135 logLik: -296.0123
## iteration 140 logLik: -295.9362
## iteration 145 logLik: -295.857
## iteration 150 logLik: -295.8008
## iteration 155 logLik: -295.7101
## iteration 160 logLik: -295.6686
## iteration 165 logLik: -295.6201
## iteration 170 logLik: -295.5736
## iteration 175 logLik: -295.5364
## iteration 180 logLik: -295.4914
## iteration 185 logLik: -295.471
## iteration 190 logLik: -295.4475
## iteration 195 logLik: -295.4322
## iteration 200 logLik: -295.4082
## iteration 205 logLik: -295.3905
## iteration 210 logLik: -295.3746
## Warning in em.depmix(object = object, maxit = emcontrol$maxit, tol =
## emcontrol$tol, : likelihood decreased on iteration 212
summary(ds2.fm2.3s)
## Initial state probabilties model
## pr1 pr2 pr3
## 0 0 1
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed +
## p.start.position.y + oppo.start.position.x + oppo.start.position.y +
## p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients:
## St1 St2 St3
## (Intercept) 0 -27.6037580 3.9045140
## p.movement.angle.1 0 -7.3637171 -6.8871059
## oppo.diff.avg.shot.and.match.movement.speed 0 -21.0188105 -9.1958914
## p.start.position.y 0 0.3896248 2.0294815
## oppo.start.position.x 0 44.8326208 -0.7108344
## oppo.start.position.y 0 -3.1670927 -7.2587288
## p.diff.avg.shot.and.match.movement.speed 0 26.2285021 14.9131106
## lag.oppo.height.off.net 0 -26.2464468 13.5404045
## Probalities at zero values of the covariates.
## 0.01975271 2.029864e-14 0.9802473
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed +
## p.start.position.y + oppo.start.position.x + oppo.start.position.y +
## p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients:
## St1 St2 St3
## (Intercept) 0 551.91342 400.36869
## p.movement.angle.1 0 -87.96094 -153.00183
## oppo.diff.avg.shot.and.match.movement.speed 0 389.09377 473.90423
## p.start.position.y 0 -83.04395 -94.30928
## oppo.start.position.x 0 -454.92390 -220.24288
## oppo.start.position.y 0 -605.43593 -244.06699
## p.diff.avg.shot.and.match.movement.speed 0 -574.22139 -716.03481
## lag.oppo.height.off.net 0 -172.18652 -800.69210
## Probalities at zero values of the covariates.
## 2.027903e-240 1 1.530946e-66
##
## Transition model for state (component) 3
## Model of type multinomial (mlogit), formula: ~p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed +
## p.start.position.y + oppo.start.position.x + oppo.start.position.y +
## p.diff.avg.shot.and.match.movement.speed + lag.oppo.height.off.net
## Coefficients:
## St1 St2 St3
## (Intercept) 0 279.49155 -485.0889
## p.movement.angle.1 0 17.10067 -336.7601
## oppo.diff.avg.shot.and.match.movement.speed 0 60.95021 672.9418
## p.start.position.y 0 -55.13964 -228.5750
## oppo.start.position.x 0 -199.15758 190.5777
## oppo.start.position.y 0 28.57099 132.4371
## p.diff.avg.shot.and.match.movement.speed 0 -740.81958 -128.0758
## lag.oppo.height.off.net 0 -419.15452 147.1055
## Probalities at zero values of the covariates.
## 4.153003e-122 1 0
##
##
## Response parameters
## Resp 1 : multinomial
## Re1.-1 Re1.0 Re1.1
## St1 0.063 0.937 0.000
## St2 0.355 0.568 0.077
## St3 0.000 0.628 0.372
#plotting Data Set 2 Training Model 2 with 3 states
probs = posterior(ds2.fm2.3s)
plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')
matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
legend(x='topright', c('State1','State2', "State3"), fill=1:3, bty='n')
ggplot() +
geom_path(data=fed_scale, aes(x = player.total.shot.number, y = winner.return.error)) +
xlab("Federer shot Number") +
ylab("Actual State") +
ggtitle("Actual state")
probs = posterior(ds3.fm2a.3s)
plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')
matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
legend(x='topright', c('State1','State2', "State3"), fill=1:3, bty='n')
###Training Model 3 Data Set 3 Commented out due to causing errors with knit.
#t1m1
library(depmixS4)
library(dplyr)
t1.mod1 <- depmix(winner ~ 1, transition = ~ oppo.speed + ser1 + ser2 + start.x, data = fed16_only, nstates = 2, family=multinomial("identity"))
t1.fm1 <- fit(t1.mod1)
## iteration 0 logLik: -77.76366
## iteration 5 logLik: -77.48424
## iteration 10 logLik: -76.90073
## iteration 15 logLik: -76.14037
## iteration 20 logLik: -75.62631
## iteration 25 logLik: -75.35881
## iteration 30 logLik: -75.11051
## iteration 35 logLik: -74.79877
## iteration 40 logLik: -74.43574
## iteration 45 logLik: -74.04317
## iteration 50 logLik: -73.6559
## iteration 55 logLik: -73.31353
## iteration 60 logLik: -73.03826
## iteration 65 logLik: -72.82036
## iteration 70 logLik: -72.63312
## iteration 75 logLik: -72.45446
## iteration 80 logLik: -72.26937
## iteration 85 logLik: -72.07499
## iteration 90 logLik: -71.89061
## iteration 95 logLik: -71.73901
## iteration 100 logLik: -71.62493
## iteration 105 logLik: -71.54425
## iteration 110 logLik: -71.48966
## iteration 115 logLik: -71.45382
## iteration 120 logLik: -71.43074
## iteration 125 logLik: -71.41607
## iteration 130 logLik: -71.40677
## iteration 135 logLik: -71.401
## iteration 140 logLik: -71.3974
## iteration 145 logLik: -71.39516
## iteration 150 logLik: -71.39376
## iteration 155 logLik: -71.39287
## iteration 160 logLik: -71.39232
## iteration 165 logLik: -71.39198
## iteration 170 logLik: -71.39177
## iteration 175 logLik: -71.39164
## iteration 180 logLik: -71.39156
## iteration 185 logLik: -71.3915
## iteration 190 logLik: -71.39147
## iteration 195 logLik: -71.39145
## iteration 200 logLik: -71.39144
## iteration 205 logLik: -71.39143
## iteration 210 logLik: -71.39143
## converged at iteration 214 with logLik: -71.39142
summary(t1.fm1)
## Initial state probabilties model
## pr1 pr2
## 1 0
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~oppo.speed + ser1 + ser2 + start.x
## Coefficients:
## St1 St2
## (Intercept) 0 6.1234140
## oppo.speed 0 -0.0175171
## ser1 0 -2.0220807
## ser2 0 8.1454947
## start.x 0 0.2116409
## Probalities at zero values of the covariates.
## 0.002186173 0.9978138
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~oppo.speed + ser1 + ser2 + start.x
## Coefficients:
## St1 St2
## (Intercept) 0 1.25959779
## oppo.speed 0 -0.06784264
## ser1 0 0.54424531
## ser2 0 0.71535248
## start.x 0 -0.16521021
## Probalities at zero values of the covariates.
## 0.2210431 0.7789569
##
##
## Response parameters
## Resp 1 : multinomial
## Re1.0 Re1.1
## St1 0 1
## St2 1 0
library(depmixS4)
t2.mod1 <- depmix(list(winner ~ 1, speed.ratio ~ 1), transition = ~ oppo.speed + ser1 + ser2 + start.x + o.angle + oppo.hit.x + lag.p.angle + lag.speed.ratio, data = fed16_only, nstates = 2, family=list(multinomial("identity"), gaussian()))
t2.fm1<- fit(t2.mod1)
## iteration 0 logLik: -143.5021
## iteration 5 logLik: -126.2489
## iteration 10 logLik: -123.843
## iteration 15 logLik: -123.154
## iteration 20 logLik: -122.5015
## iteration 25 logLik: -121.0058
## iteration 30 logLik: -120.641
## iteration 35 logLik: -120.5495
## iteration 40 logLik: -120.4756
## iteration 45 logLik: -120.4366
## iteration 50 logLik: -120.4199
## iteration 55 logLik: -120.412
## iteration 60 logLik: -120.4066
## iteration 65 logLik: -120.4012
## iteration 70 logLik: -120.3918
## iteration 75 logLik: -120.3746
## iteration 80 logLik: -120.3439
## iteration 85 logLik: -120.2986
## iteration 90 logLik: -120.2568
## iteration 95 logLik: -120.234
## iteration 100 logLik: -120.2254
## iteration 105 logLik: -120.2221
## iteration 110 logLik: -120.2206
## iteration 115 logLik: -120.2195
## iteration 120 logLik: -120.2187
## iteration 125 logLik: -120.2182
## iteration 130 logLik: -120.2179
## converged at iteration 132 with logLik: -120.2179
summary(t2.fm1)
## Initial state probabilties model
## pr1 pr2
## 0 1
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~oppo.speed + ser1 + ser2 + start.x + o.angle + oppo.hit.x +
## lag.p.angle + lag.speed.ratio
## Coefficients:
## St1 St2
## (Intercept) 0 5.097239
## oppo.speed 0 -2.769014
## ser1 0 -80.897804
## ser2 0 85.995043
## start.x 0 32.533365
## o.angle 0 13.544531
## oppo.hit.x 0 51.275271
## lag.p.angle 0 -4.068974
## lag.speed.ratio 0 -88.178116
## Probalities at zero values of the covariates.
## 0.006076456 0.9939235
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~oppo.speed + ser1 + ser2 + start.x + o.angle + oppo.hit.x +
## lag.p.angle + lag.speed.ratio
## Coefficients:
## St1 St2
## (Intercept) 0 -8.3734301
## oppo.speed 0 0.3738778
## ser1 0 -5.5242473
## ser2 0 -2.8491828
## start.x 0 0.6478550
## o.angle 0 -0.1590219
## oppo.hit.x 0 0.5840919
## lag.p.angle 0 0.0375421
## lag.speed.ratio 0 2.1257951
## Probalities at zero values of the covariates.
## 0.9997691 0.0002308688
##
##
## Response parameters
## Resp 1 : multinomial
## Resp 2 : gaussian
## Re1.0 Re1.1 Re2.(Intercept) Re2.sd
## St1 0.763 0.237 1.262 0.548
## St2 0.877 0.123 0.902 0.219
##Stepwise for first best variable
#run through all variables to find highest log lik
fcov1 <- lapply(fed16_scale[c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "p.diff.avg.shot.and.match.movement.speed", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 160 with logLik: -69.6267
## converged at iteration 490 with logLik: -75.29084
## converged at iteration 204 with logLik: -71.53859
## converged at iteration 300 with logLik: -74.05906
## converged at iteration 378 with logLik: -70.67435
## converged at iteration 274 with logLik: -73.30207
## converged at iteration 257 with logLik: -73.03561
## converged at iteration 230 with logLik: -70.70626
#Pulling covergence log likelihoods into a dataframe
fcov1_df <- as.data.frame(
c(logLik(fcov1$p.start.position.x), logLik(fcov1$p.start.position.y),
logLik(fcov1$oppo.start.position.x), logLik(fcov1$oppo.start.position.y),
logLik(fcov1$p.movement.angle.1),
logLik(fcov1$p.diff.avg.shot.and.match.movement.speed),
logLik(fcov1$oppo.diff.avg.shot.and.match.movement.speed),
logLik(fcov1$diff.p.avg.and.current.shot.speed),
logLik(fcov1$lag.oppo.height.off.net)
)
)
fcov1_df$newcolumn<-c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "p.diff.avg.shot.and.match.movement.speed", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")
names(fcov1_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
fcov1_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -69.6267 p.start.position.x
###First best variable is p.start.position.x
#Running single variable model of p.start.position.x to look at coefficients
mod1 <- depmix(winner ~ 1, transition = ~ p.start.position.x, data = fed16_scale, nstates = 2, family=multinomial("identity"))
fm1 <- fit(mod1)
## iteration 0 logLik: -77.78931
## iteration 5 logLik: -77.71084
## iteration 10 logLik: -77.57877
## iteration 15 logLik: -77.45259
## iteration 20 logLik: -77.32209
## iteration 25 logLik: -77.16141
## iteration 30 logLik: -76.95374
## iteration 35 logLik: -76.69281
## iteration 40 logLik: -76.41159
## iteration 45 logLik: -76.17172
## iteration 50 logLik: -75.98834
## iteration 55 logLik: -75.79927
## iteration 60 logLik: -75.09728
## iteration 65 logLik: -73.06424
## iteration 70 logLik: -72.27577
## iteration 75 logLik: -71.85849
## iteration 80 logLik: -71.39553
## iteration 85 logLik: -70.94939
## iteration 90 logLik: -70.63778
## iteration 95 logLik: -70.4384
## iteration 100 logLik: -70.30736
## iteration 105 logLik: -70.21648
## iteration 110 logLik: -70.15506
## iteration 115 logLik: -70.14649
## iteration 120 logLik: -70.14242
## iteration 125 logLik: -70.14049
## iteration 130 logLik: -70.13959
## iteration 135 logLik: -70.13917
## iteration 140 logLik: -70.13897
## iteration 145 logLik: -70.13888
## iteration 150 logLik: -70.13887
## converged at iteration 151 with logLik: -70.13887
summary(fm1)
## Initial state probabilties model
## pr1 pr2
## 1 0
##
## Transition model for state (component) 1
## Model of type multinomial (mlogit), formula: ~p.start.position.x
## Coefficients:
## St1 St2
## (Intercept) 0 -38.20915
## p.start.position.x 0 232.47106
## Probalities at zero values of the covariates.
## 1 2.546701e-17
##
## Transition model for state (component) 2
## Model of type multinomial (mlogit), formula: ~p.start.position.x
## Coefficients:
## St1 St2
## (Intercept) 0 4.469282
## p.start.position.x 0 -19.593038
## Probalities at zero values of the covariates.
## 0.01132579 0.9886742
##
##
## Response parameters
## Resp 1 : multinomial
## Re1.0 Re1.1
## St1 0.593 0.407
## St2 0.981 0.019
#Graphing fitted model states for first best variable
probs = posterior(fm1)
plot(probs$state, type='s', main='Implied States', xlab='', ylab='State')
matplot(probs[,-1], type='l', main='Probabilities', ylab='Probability')
legend(x='topright', c('State1','State2'), fill=1:2, bty='n')
##Stepwise to find second best variable
#run through all variables to find highest log lik
fcov2 <- lapply(fed16_scale[c( "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 218 with logLik: -70.0557
## converged at iteration 80 with logLik: -66.88238
## converged at iteration 235 with logLik: -70.79542
## converged at iteration 291 with logLik: -69.94087
## converged at iteration 163 with logLik: -67.95979
## converged at iteration 143 with logLik: -69.06322
## converged at iteration 108 with logLik: -67.64185
#Pulling covergence log likelihoods into a dataframe
fcov2_df <- as.data.frame(
c(logLik(fcov2$p.start.position.y),
logLik(fcov2$oppo.start.position.x), logLik(fcov2$oppo.start.position.y),
logLik(fcov2$p.movement.angle.1),
logLik(fcov2$oppo.diff.avg.shot.and.match.movement.speed),
logLik(fcov2$diff.p.avg.and.current.shot.speed),
logLik(fcov2$lag.oppo.height.off.net)
)
)
fcov2_df$newcolumn<-c( "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")
names(fcov2_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
fcov2_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -66.88238 oppo.start.position.x
##Stepwise to find the third best variable
#second best variable is lag.oppo.height.off.net
#run through all variables to find highest log lik
fcov3 <- lapply(fed16_scale[c("p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + lag.oppo.height.off.net + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 191 with logLik: -66.33359
## converged at iteration 181 with logLik: -66.32436
## converged at iteration 175 with logLik: -66.84969
## converged at iteration 260 with logLik: -64.27658
## converged at iteration 158 with logLik: -62.66555
## converged at iteration 161 with logLik: -65.57351
#Pulling covergence log likelihoods into a dataframe
fcov3_df <- as.data.frame(
c(logLik(fcov3$p.start.position.y),
logLik(fcov3$oppo.start.position.x), logLik(fcov3$oppo.start.position.y),
logLik(fcov3$p.movement.angle.1),
logLik(fcov3$oppo.diff.avg.shot.and.match.movement.speed),
logLik(fcov3$diff.p.avg.and.current.shot.speed) )
)
fcov3_df$newcolumn<-c("p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed")
names(fcov3_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
fcov3_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -62.66555 oppo.diff.avg.shot.and.match.movement.speed
##Stepwise to find the fourth best variable
library(depmixS4)
#third best variable is oppo.diff.avg.shot.and.match.movement.speed
#run through all variables to find highest log lik
fcov4 <- lapply(fed16_scale[c("p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "diff.p.avg.and.current.shot.speed")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 134 with logLik: -62.01148
## converged at iteration 141 with logLik: -61.90335
## converged at iteration 107 with logLik: -62.56972
## converged at iteration 253 with logLik: -62.3559
## converged at iteration 94 with logLik: -62.08507
#Pulling covergence log likelihoods into a dataframe
fcov4_df <- as.data.frame(
c(logLik(fcov4$p.start.position.y),
logLik(fcov4$oppo.start.position.x), logLik(fcov4$oppo.start.position.y),
logLik(fcov4$p.movement.angle.1),
logLik(fcov4$diff.p.avg.and.current.shot.speed) )
)
fcov4_df$newcolumn<-c("p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "diff.p.avg.and.current.shot.speed")
names(fcov4_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
fcov4_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -61.90335 oppo.start.position.x
##Stepwise to find fifth best variable
#Fourth best variable is oppo.start.position.x
#run through all variables to find highest log lik
fcov5 <- lapply(fed16_scale[c( "p.start.position.y", "oppo.start.position.y", "p.movement.angle.1", "diff.p.avg.and.current.shot.speed")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 109 with logLik: -61.46449
## converged at iteration 172 with logLik: -61.89461
## converged at iteration 79 with logLik: -60.87455
## converged at iteration 142 with logLik: -60.51499
#Pulling covergence log likelihoods into a dataframe
fcov5_df <- as.data.frame(
c(logLik(fcov5$p.start.position.y), logLik(fcov5$oppo.start.position.y),
logLik(fcov5$p.movement.angle.1),
logLik(fcov5$diff.p.avg.and.current.shot.speed) )
)
fcov5_df$newcolumn<-c("p.start.position.y", "oppo.start.position.y", "p.movement.angle.1", "diff.p.avg.and.current.shot.speed")
names(fcov5_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
fcov5_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -60.51499 diff.p.avg.and.current.shot.speed
##Stepwise to find sixth best variable
#fifth best variable is diff.p.avg.and.current.shot.speed
#run through all variables to find highest log lik
fcov6 <- lapply(fed16_scale[c( "p.start.position.y", "oppo.start.position.y", "p.movement.angle.1")], function(w) fit((depmix(winner ~ 1, data = fed16_scale, transition = ~ p.start.position.x + lag.oppo.height.off.net + oppo.diff.avg.shot.and.match.movement.speed + oppo.start.position.x + diff.p.avg.and.current.shot.speed + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 110 with logLik: -61.08021
## converged at iteration 111 with logLik: -60.53194
#Pulling covergence log likelihoods into a dataframe
fcov6_df <- as.data.frame(
c(logLik(fcov6$p.start.position.y), logLik(fcov6$oppo.start.position.y),
logLik(fcov6$p.movement.angle.1) )
)
fcov6_df$newcolumn<-c("p.start.position.y", "oppo.start.position.y", "p.movement.angle.1")
names(fcov6_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
fcov6_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -54.00198 p.movement.angle.1
#Model stops improving last variable added to the model is p.movement.angle.1
#run through all variables to find highest log lik
#tm2.fcov1 is training model 2 fitted covariates 1
tm2.fcov1 <- lapply(fed_scale[c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "p.diff.avg.shot.and.match.movement.speed", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 415 with logLik: -187.7404
#Pulling covergence log likelihoods into a dataframe
tm2.fcov1_df <- as.data.frame(
c(logLik(tm2.fcov1$p.start.position.x), logLik(tm2.fcov1$p.start.position.y),
logLik(tm2.fcov1$oppo.start.position.x), logLik(tm2.fcov1$oppo.start.position.y),
logLik(tm2.fcov1$p.movement.angle.1),
logLik(tm2.fcov1$p.diff.avg.shot.and.match.movement.speed),
logLik(tm2.fcov1$oppo.diff.avg.shot.and.match.movement.speed),
logLik(tm2.fcov1$diff.p.avg.and.current.shot.speed),
logLik(tm2.fcov1$lag.oppo.height.off.net) )
)
tm2.fcov1_df$newcolumn<-c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.movement.angle.1", "p.diff.avg.shot.and.match.movement.speed", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")
names(tm2.fcov1_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
tm2.fcov1_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -181.9063 p.movement.angle.1
#first best variable p.movement.angle.1
#run through all variables to find highest log lik
tm2.fcov2 <- lapply(fed_scale[c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 282 with logLik: -183.4989
## converged at iteration 1 with logLik: -193.5096
## converged at iteration 316 with logLik: -176.6738
## converged at iteration 461 with logLik: -180.3282
## converged at iteration 254 with logLik: -179.3088
#Pulling covergence log likelihoods into a dataframe
tm2.fcov2_df <- as.data.frame(
c(logLik(tm2.fcov2$p.start.position.x), logLik(tm2.fcov2$p.start.position.y),
logLik(tm2.fcov2$oppo.start.position.x), logLik(tm2.fcov2$oppo.start.position.y),
logLik(tm2.fcov2$p.diff.avg.shot.and.match.movement.speed),
logLik(tm2.fcov2$oppo.diff.avg.shot.and.match.movement.speed),
logLik(tm2.fcov2$diff.p.avg.and.current.shot.speed),
logLik(tm2.fcov2$lag.oppo.height.off.net) )
)
tm2.fcov2_df$newcolumn<-c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "oppo.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")
names(tm2.fcov2_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
tm2.fcov2_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -176.6738 oppo.diff.avg.shot.and.match.movement.speed
#second best variable is oppo.diff.avg.shot.and.match.movement.speed
#run through all variables to find highest log lik
tm2.fcov3 <- lapply(fed_scale[c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 473 with logLik: -178.4031
## converged at iteration 270 with logLik: -175.8243
## converged at iteration 241 with logLik: -177.3179
## converged at iteration 376 with logLik: -181.2354
## converged at iteration 441 with logLik: -178.1429
#Pulling covergence log likelihoods into a dataframe
tm2.fcov3_df <- as.data.frame(
c(logLik(tm2.fcov3$p.start.position.x), logLik(tm2.fcov3$p.start.position.y),
logLik(tm2.fcov3$oppo.start.position.x), logLik(tm2.fcov3$oppo.start.position.y),
logLik(tm2.fcov3$p.diff.avg.shot.and.match.movement.speed),
logLik(tm2.fcov3$diff.p.avg.and.current.shot.speed),
logLik(tm2.fcov3$lag.oppo.height.off.net) )
)
tm2.fcov3_df$newcolumn<-c( "p.start.position.x", "p.start.position.y", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")
names(tm2.fcov3_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
tm2.fcov3_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -175.8243 p.start.position.y
#third best variable is p.start.position.y
#run through all variables to find highest log lik
tm2.fcov4 <- lapply(fed_scale[c( "p.start.position.x", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 205 with logLik: -173.8228
## converged at iteration 335 with logLik: -174.9686
## converged at iteration 301 with logLik: -175.5103
## converged at iteration 312 with logLik: -175.0519
#Pulling covergence log likelihoods into a dataframe
tm2.fcov4_df <- as.data.frame(
c(logLik(tm2.fcov4$p.start.position.x),
logLik(tm2.fcov4$oppo.start.position.x), logLik(tm2.fcov4$oppo.start.position.y),
logLik(tm2.fcov4$p.diff.avg.shot.and.match.movement.speed),
logLik(tm2.fcov4$diff.p.avg.and.current.shot.speed),
logLik(tm2.fcov4$lag.oppo.height.off.net) )
)
tm2.fcov4_df$newcolumn<-c( "p.start.position.x", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")
names(tm2.fcov4_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
tm2.fcov4_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -173.8228 oppo.start.position.x
#fourth best variable is oppo.start.position.x
#run through all variables to find highest log lik
tm2.fcov5 <- lapply(fed_scale[c( "p.start.position.x", "oppo.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 255 with logLik: -178.546
## converged at iteration 220 with logLik: -173.8217
## converged at iteration 209 with logLik: -172.6809
## converged at iteration 398 with logLik: -172.6645
## converged at iteration 203 with logLik: -178.4566
## converged at iteration 265 with logLik: -173.5052
#Pulling covergence log likelihoods into a dataframe
tm2.fcov5_df <- as.data.frame(
c(logLik(tm2.fcov5$p.start.position.x), logLik(tm2.fcov5$oppo.start.position.y),
logLik(tm2.fcov5$p.diff.avg.shot.and.match.movement.speed),
logLik(tm2.fcov5$diff.p.avg.and.current.shot.speed),
logLik(tm2.fcov5$lag.oppo.height.off.net) )
)
tm2.fcov5_df$newcolumn<-c( "p.start.position.x", "oppo.start.position.y", "p.diff.avg.shot.and.match.movement.speed", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")
names(tm2.fcov5_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
tm2.fcov5_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -172.6645 p.diff.avg.shot.and.match.movement.speed
#model stops improving at this point
#5th best variable alternates between oppo.start.position.y and p.diff.avg.shot.and.match.movement.speed. Adding both.
#run through all variables to find highest log lik
tm2.fcov6 <- lapply(fed_scale[c( "p.start.position.x", "oppo.start.position.x", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")], function(w) fit((depmix(winner ~ 1, data = fed_scale, transition = ~ p.movement.angle.1 + oppo.diff.avg.shot.and.match.movement.speed + p.start.position.y + oppo.start.position.x + oppo.start.position.y + p.diff.avg.shot.and.match.movement.speed + w, nstates = 2, family=multinomial("identity"))), verbose = FALSE, emc=em.control(rand=TRUE)))
## converged at iteration 232 with logLik: -178.3589
## converged at iteration 344 with logLik: -173.2838
## converged at iteration 205 with logLik: -177.9219
## converged at iteration 208 with logLik: -168.8888
#Pulling covergence log likelihoods into a dataframe
tm2.fcov6_df <- as.data.frame(
c(logLik(tm2.fcov6$p.start.position.x),
logLik(tm2.fcov6$diff.p.avg.and.current.shot.speed),
logLik(tm2.fcov6$lag.oppo.height.off.net) )
)
tm2.fcov6_df$newcolumn<-c( "p.start.position.x", "diff.p.avg.and.current.shot.speed", "lag.oppo.height.off.net")
names(tm2.fcov6_df) <- c("convergence.loglik", "Variables")
#print variable with the highest convergence log likelihood
tm2.fcov6_df %>%
slice(which.max(convergence.loglik))
## convergence.loglik Variables
## 1 -168.8888 lag.oppo.height.off.net
#The model stops improving after this point. Last variable added to the model is lag.oppo.height.off.net
commented out to due to causing errors with knit.
Below are:
-some of the visualisations I created when figuring out how to create the animation.
-Analysis of how the covariates and response variables move over time
-Visualisation in the differences between player positions and movement speeds.
library(grid)
# ggplot(fed16_longest, aes(x=p.start.position.x, xend=p.end.position.x, y=p.start.position.y, yend=p.end.position.y, z=winner)) + stat_summary_hex(fun = function(winner) sum(winner)) + geom_segment(arrow = arrow(angle = 15,)) + xlab("Federer Position X ") + ylab("Federer Position Y") + ggtitle("Federer positions in longest rally")
# ggplot(fed16_longest, aes(x=p.start.position.x, xend=p.end.position.x, y=p.start.position.y, yend=p.end.position.y, z=winner)) +
# stat_summary_hex(fun = function(winner) sum(winner)) +
# geom_text(aes(label=shot)) +
# geom_path(arrow = arrow(angle = 15)) +
# xlab("Federer Position X ") +
# ylab("Federer Position Y") +
# ggtitle("Federer positions in longest rally")
#static player positions
trial_df <- filter(fed16_pos, final.shot >= 14)
p.rally.paths <- ggplot(trial_df, aes(x=p.start.position.x, y=p.start.position.y)) +
scale_x_continuous(breaks = axis_labels$x.break) +
scale_y_continuous(breaks = axis_labels$y.break) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point(aes(alpha = shot)) +
geom_path( arrow = arrow(angle = 15)) +
facet_wrap(~rally.number) +
xlab("Federer Position X ") +
ylab("Federer Position Y") +
ggtitle("Federer start and end positions for rallies 14 shots or longer")
p.rally.paths <- ggplotly(p.rally.paths)
p.rally.paths
oppo.rally.paths <- ggplot(trial_df, aes(x=oppo.start.position.x, y=oppo.start.position.y )) +
scale_x_continuous(breaks = axis_labels$x.break) +
scale_y_continuous(breaks = axis_labels$y.break) +
geom_path(data = court_trace, aes(x = x, y = y), color = 'black', size = 1, alpha = 0.75) +
geom_path(data = net_trace, aes(x = x, y = y), color = 'grey40', size = 1, lineend = 'round') +
coord_fixed() +
geom_point(aes(alpha = shot)) +
geom_path( arrow = arrow(angle = 15)) +
facet_wrap(~rally.number) + xlab("Opponent Position X ") + ylab("Opponent Position Y") +
ggtitle("Opponent start and end positions for for rallies 14 shots or longer")
oppo.rally.paths <- ggplotly(oppo.rally.paths)
oppo.rally.paths
library(dplyr)
library(tidyr)
stry_plot1<- stry_scale %>%
gather(key = vars, value = measurement, p.start.position.x, oppo.start.position.x, p.diff.avg.shot.and.match.movement.speed)
stry_plot2<- stry_scale %>%
gather(key = vars, value = measurement, p.start.position.y, lag.oppo.height.off.net)
stry_plot3<- stry_scale %>%
gather(key = vars, value = measurement, oppo.start.position.y, oppo.diff.avg.shot.and.match.movement.speed, diff.p.avg.and.current.shot.speed )
stry_plot4<- stry_scale %>%
gather(key = vars, value = measurement, winner.return.error, p.advantage)
ggplot(stry_plot1, aes(x=player.total.shot.number,y=measurement)) + geom_line(color="blue") + facet_grid(vars~., scale="free_y") + xlab("Time") + ggtitle("Federer vs. Berdych 2016: Looking at how variables move together over the match")
ggplot(stry_plot2, aes(x=player.total.shot.number,y=measurement)) + geom_line(color="blue") + facet_grid(vars~., scale="free_y") + xlab("Time") + ggtitle("Federer vs. Berdych 2016: Looking at how variables move together over the match")
ggplot(stry_plot3, aes(x=player.total.shot.number,y=measurement)) + geom_line(color="blue") + facet_grid(vars~., scale="free_y") + xlab("Time") + ggtitle("Federer vs. Berdych 2016: Looking at how variables move together over the match")
ggplot(stry_plot4, aes(x=player.total.shot.number,y=measurement)) + geom_line(color="blue") + facet_grid(vars~., scale="free_y") + xlab("Time") + ggtitle("Federer vs. Berdych 2016: Looking at how variables move together over the match")
####Player/Opponent position & speed differentials
#player baseline/centre distance differentials
ggplot(longral_pos, aes(x=shot,y=p.start.position.base.diff)) +
geom_point() +
geom_path() +
geom_hline(yintercept = 0, linetype = "dotdash") +
facet_wrap(~rally.number) +
xlab("Shot Number") +
ylab("Difference in distance from baseline Federer Vs Opponent") +
ggtitle("Federer vs Opponent difference in baseline distance in rallies 10 shots or more")
ggplot() +
geom_path(data=longral_pos, aes(x = shot, y = p.start.position.y, col="blue")) +
geom_path(data=longral_pos, aes(x = shot, y = oppo.start.position.y, col="red")) +
geom_hline(yintercept = 0, linetype = "dotdash") +
facet_wrap(~rally.number) +
xlab("Shot Number") +
ylab("Distance from centre for Federer and Opponent") +
ggtitle("Federer vs Opponent distance from centre in rallies 10 shots or more") +
scale_colour_manual(name = "Player", values=c("blue","red"), labels= c("Federer","Berdych"))
ggplot() +
geom_path(data=longral_pos, aes(x = shot, y = p.start.position.base.dist,
yend =p.end.position.base.dist, col="blue")) +
geom_path(data=longral_pos, aes(x = shot, y = oppo.start.position.base.dist,
yend =oppo.end.position.base.dist,col="red")) +
geom_hline(yintercept = 0, linetype = "dotdash") +
facet_wrap(~rally.number) +
xlab("Shot Number") +
ylab("Distance from baseline for Federer and Opponent") +
ggtitle("Federer vs Opponent distance from baseline in rallies 10 shots or more") +
scale_colour_manual(name = "Player", values=c("blue","red"), labels= c("Federer","Berdych"))
## Warning: Ignoring unknown aesthetics: yend
## Warning: Ignoring unknown aesthetics: yend
#speed/acceleration differentials
ggplot() +
geom_path(data=longral_df, aes(x = shot, y = p.avg.speed, col="blue")) +
geom_path(data=longral_df, aes(x = shot, y = oppo.avg.speed, col="red")) +
facet_wrap(~rally.number) +
xlab("Shot Number") +
ylab("Avg Movement Speed m/s Federer and Opponent") +
ggtitle("Federer vs Opponent Avg Movement Speed in Rallies 10 shots or more") +
scale_colour_manual(name = "Player", values=c("blue","red"), labels= c("Federer","Berdych"))
ggplot(longral_pos, aes(x=shot,y=avg.player.speed.diff)) +
geom_path() +
geom_hline(yintercept = 0, linetype = "dotdash") +
facet_wrap(~rally.number) +
xlab("Shot Number") +
ylab("Difference Avg Movement Speed m/s Federer vs Opponent") +
ggtitle("Federer vs Opponent difference in Avg Movement Speed m/s Federer and Opponent in rallies 10 shots or more")
We can also facet the animation by rallies. Code commented out to reduce size of html.